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1 Introduction 

Discovering latent representations of the observed w orld has become increasingly 



more rel evant in t he artificial intelligence literature [Hinton and Salakhutdinov , 
12006. Bengio and C un. 2007] . Much of the effort concentrates on building latent 
variables which can be used in prediction problems, such as classification and 
regression. 

A related goal of learning latent structure from data is that of identifying 
which hidden common causes generate the observations. This becomes relevant 
in applications that require predicting the effect of policies. 

As an example, consider the problem of identifying the effects of the "indus- 
trialization level" of a country on its "democratization level" across two different 
time points. Democratization levels and industrialization levels are not directly 
observable: they are hidden common causes of observable indicators which can 
be recorded and analyzed. For instance, gross national product (GNP) is an 
indicator of industrialization level, while expert assessments of freedom of press 
can be used as indicators of democratization. Extended discussions on the 
distinction between indicators and the latent variabl es they measure can be 



found in the literature of structural equat ion models BoUenl 1989l | and error- 



in- variables regression [Carroll et al.l . Il995 | 



Causal networks can be used as a language to represent this information. We 
postulate a graphical encoding of causal relationships among random variables, 
where vertices in the graph representing random variables and directed edges 
Vi — Vj represent the notion that Vi is a direct cause of Vj . Formal defini tions 
of direct causation and causal networks are given by ISpirtes et al. 2000| and 
[Pcarl [2000]. 

In our setup, we explicitly represent latent variables of interest as vertices 
in the graph. For example, in Figure [1] we have a network representation for 
the problem of causation between industrialization and democratization levels. 
This model makes assumptions about the connections among latent variables 
themselves: e.g., industrialization causes democratization, and the possibility of 
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Figure 1: A causal diagram connecting industrialization levels (IL) of a country, 
in 1960, to democratization levels in 1960 and 1965 [DLqq and DLq^, respec- 
tively). In our diagrams, we follow the standard structural equation modeling 
notation and use square vertices represent observable variables, circles repre- 
sent latent variables (Bollen, 1989). The industrialization indicators are: Yi, 
gross national product; I2, energy consumption per capita; Y3, percentage of 
labor force in industry. The democratization indicators are: l^/Yg, freedom 
of press; Y^/Yg, freedom of opposition; Yq/Yiq, fairness of elections; Y^/Yu, 
elective nature of legislative body. Details are given by Bollen (1989). 



unmeasured confounding between industrialization and democratization is not 
taken into account (which, of course, c an be criticized and refined). 



Fo llow ing the mixed graph notation [Richard son and Spirtesl . l2002LISpirtes et al 



2OOOI [Pearl, 2000], we also use bi-directed edges Vi Vj to denote implicit 



paths due to latent common causes. That is, Vi O Vj denotes a set of causal 
paths (e.g., Vi ■(^ X ^ Vj) that originate from common causes that have been 
marginalized (such as X in t he previous example), as discussed in full detail by 
Richardson and Spirtes l2002i The distinction between "explicit" and "implicit" 
latent variables is problem dependent: if we do not wish to establish causal 
effects for some hidden variables, then they can be marginalized. 

Establishing the causal connections among latent variables is an important 
causal question, but it is only meaningful if such hidden variables are connected 
to our observations. A complementary and perhaps even more fundamental 
problem is that of finding which latent variables are out there, and how they 
cause the observed measures. 

This will be the main problem tackled in our contribution: given a dataset 
of indicators assumed to be generated by unknown and unmeasured common 
causes, we wish to discover which hidden common causes are those, and how 
they generate our data. Using a definition from the structural equation modeling 
literatur e, we say we are interested in learning the measurement model of our 
problem |Bollenl [l989t . 

In the context of the example of Figure [U suppose we are given a dataset 
with 11 indicators, and wish to unveil the respective latent common causes and 
measurement model. Assuming Figure [1] as the unknown gold standard, we are 
successful if we predict that {11,1^2,^3} are generated by a particular hidden 
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common cause, {Y4, ...,17} are generated by another hidden common cause and 
so on. Notice that interpreting the resulting latent variables and linking them 
to real entities and possible interventions requires knowledge of the domain. 
However, their existence and relationship to the observations follow from our 
data and assumptions, not from a posthoc interpretation. 

The solution to this problem lies at the intersection of artificial intelligence 
techniques to infer causal structures, statistical models and the exploitation of 
assumptions commonly made in some applied sciences such as psychology and 
social sciences. 

Success will depend on how structured the real-world causal network is and 
how valid our assumptions are. If the postulated true network that generated 
our data is not sparse, for instance, then there will be so many models com- 
patible with the observed data that no useful conclusion can be made. This 
situation, however, is not different from the limitations of standard causal net- 



work discovery procedures (with no latent variables) [Spirtes et al.l . 2000] , which 



rely on the existence of many conditional independence constraints. 

We describe our assumptions and a formal problem statement in full detail 
in Section [2l An algorithm to tackle the stated problem is provided in Section 
[31 Experiments are show in Section 21 followed by a Conclusion. Before that, 
however, we discuss what is the current common practice for unveiling the causal 
measurement structure of the world, and why they fall short on providing a 
reasonable solution. 



1.1 A motivating example 

Factor analysis is still the method of choice for suggesting hidden common cause 
models in the sciences. A detailed description of the method within the c ontex t 
of psychology and social sciences is given by iBartholomew and Knott 



In this section, we will illustrate the weaknesses of factor analysis. This moti- 
vates the need for more advanced methods resulting from artificial intelligence 
techniques in causal discovery. 

In a nutshell, the main assumption of factor analysis states that each ob- 
served variable Yi should be the effect of a set of latent factors X = {^1, . . . , X^} 
plus some independent error term ef. 

L 

Variables are assumed to be jointly Gaussian, although this is not strictly 
necessary. The measurement model is given by the coefficients {Xij} and vari- 
ances of the error terms {ei}. Learning the measurement model is the key task, 
which is required in order to understand what the hidden common causes should 
represent in the real-world. The factor analysis model is agnostic with respect to 
the causal structure of X, but knowing the measuremen t model would also help 
us to learn the causal structure among latent variables Spirtes et al. . 2000| . In 
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the following discussion, we will assume that we know how many latent vari- 
ables exist, and then illustrate how such a widely used method is unreliable even 
under this highly favorable circumstance. 

Given the observed covariance matrix of Y = {Yi, . . . ,Yp}, it is possible to 
infer the coefficients Xij and the covariance matrix of X, but not in an unique 
way. Without going into details, there are ways of choosing a solution among 
this equivalence class such that the measureme nt structure is as simple "as pos - 
sible" (within the selection criterion of choice) [Bartholomew and Knotti 1999l | . 
Simplicity here means having many coefficients {Xij} set to zero, indicating 
that each observed variable measures only a few of the latent variables. Getting 
the correct sparse structure is essential in order to interpret what the hidden 
common causes are. Notice that this corresponds to a directed causal network, 
where non-zero coefficients are encoded as directed edges in the graph. 

Such methods will work when the true model that generated the data is in 
fact a "simple structure," or a "pure measurement model," in the sense that 
each observed variable has a single parent in the corresponding causal network. 
However, any deviance from this simple structure will strongly compromise the 
result. 

We provide an example in Figure O We generated data from a linear causal 
model that follows the causal diagram of Figure HJajl]- Given data for the 
observed variables Yi,. . . ,Yq, we ideally would like to get a structure such as 
the one in[2Ub), where the question marks emphasize that labels for the latent 
variables should be provided by background knowledge. Notice that in this 
contribution our aim is not to learn the structure connecting the latent variables, 
and the bi-directed edge in this case denotes an arbitrary causal connection. 

Factor analysis fails to provide sensible answers. Figure[2l[c) shows a common 
outcome when we indicate that the model should have two hidden common 
causes. There exists no theory that provides a clear interpretation for these 
edges. Even worse, results can easily become meaningless. In Figure [2fd), we 
depict the result of exactly the same procedure, but where know we allow for 
three hidden common causes. The method we describe in our contribution is 
able to recover Figure [2Jb). 



2 Problem statement and assumptions 



Assume that that our data follows a di stribution V genera t ed according to a 
directed acyclic causal graph (DAG) G jSpirtes et alTl2000l IPearli |2000| with 
observed nodes Y and hid den nodes X. We also assume that the resulting distri- 
bution V is faithful to G (Spirtes et al.l . l2000| . that is, a conditional independence 
constraint holds in V if and only if it holds in Q (using the common criterion 
of d-separation — please refer to Pearl (2000) for a definition and examples). 
These are all standard assumptions from the causal discovery literature. 
Our more particular assumptions are 



^Coefficients were generated uniformly at random on tfic inverval [—1.5, —0.5] U [0.5, 1.5] 
while variances of error terms were generated uniformly in [0,0.5]. 
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(c) (d) 

Figure 2: In (a), we show a synthetic structure from which we generated 200 
data points. Our algorithm is able to reconstruct the causal graphical structure 
in (b), which captures several features of the original model. Bi-directed edges 
represent conditional association and the possibility of some unidentified set of 
hidden common causes of the corresponding vertices. Inferred latent variables 
are not labeled, but can then be interpreted from the resulting structure and 
the explicit assumptions discussed in Section [2j In (c), the resulting struc- 
ture obtained with exploratory factor analysis using two latent variables and 
the promax rotation technique. In (d), the factor analysis result we got when 
attempting to fit three latent variables. 



• no observed node y S Y is a parent in Q of any hidden node X G X; 

• each random variable in Y U X is a linear combination of its parents, plus 
additive noise, as in Equation ([!]). 

The fi rst assumptio n is motivated by applications in structural equation 
modeling [ BoUen . where prior knowledge is used to distinguish between 

standard indicators and "causal indicators," which are causes of the latent vari- 
ables of interest. Both of these assumptions can be relaxed to some extent, 
although any claims concerning the resulting causal structures learned from 
data will be weaker. ISilva et al.l 2006l | discuss the details. 

For the purposes of simplifying the presentation of this chapter, we also 
introduce the following two assumptions: 

• no observed node K g Y is an ancestor in Q of any other observed node; 

• every pair of observed nodes in Y has a common latent ancestoJl in Q; 



These two assumptions can be dropped without any loss of generality [Silva et al 



2006|, but they will be useful for presentation purposes. Notice that the latter 



As a reminder, this is not the same as having parents in common. 
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assumption implies that there are no conditional independence constraints in 
the marginal distribution of Y. As such, sta ndard causal inference algorithm 
such as the PC algorithm Spirtes et al. . 200d ] cannot provide any information. 

Notice also that we do not assume any other form of background knowledge 
concerning the number of latent variables or particular information concerning 
which observed variables have common hidden parents. 

Having clarified all assumptions on which our methods rely, the problem we 
want to solve can be formalized. Let the measurement model AdofQhea graph 
given by all vertices of G, and the edges of Q that connect latent variables to 
observed variables. In order to be agnostic with respect to the causal structure 
among latent variables, we connect each pair of latent variables by a bi-directed 
edge as a general symmetric representation of dependency. Ideally, given the 
distribution V over the observed variables and that our assumptions hold, we 
would like to reconstruct M. Since V has to be estimated from the data, it is 
of practical interest to use only features of V that can be easily estimated while 
also accounting for Gaussian distributions. As such, we rephrase our problem 
as learning A4 given S, the covariance matrix of Y. 

However, in general this is only possible if the true model entails that S is 
constrained in ways that cannot be explained by other models. For instance, if 
there are more latent variables than observed variables, and each latent variable 
is a parent of all elements of Y, then S has no constraints and an infinite number 
of models will be com patible with the data. 



Silva et al.l 2006l | formalize the problem by extracting only pure measure- 



ment submodels of the true model, subgraphs of Ai where each observed vari- 
able Y has a single parent, and where this parent d-separates Y from all other 
vertices of the submodel in Q. Such single-parent vertices are also called pure in- 
dicators. Moreover, the output of the procedure described bv lSilva et al. 200d | 
only generates submodels where each latent variable has at least three pure in- 
dicators. If such models exist, they can be discovered given E. The scientific 
motivation is that many datasets studied through structural model analysis and 
factor analysis support the existence of pure measurement submodels. As we 
mentioned in the previous section, methods for providing "simple structures" 
in factor analysis are hard to justify unless some pure measurement submodel 
exists. Therefore, it would be hard to justify factor analysis as a more flexible 
approach, since its output would be unreliable anyway. An important advantage 
of the causal discovery approach discussed here is that it knows its limitations. 

Our contribution is to extend the work of Silva et al. by allowing several 
"impurities" in the output of our new procedure. To give an example where 
this is necessary, consider Figure [Ifa) again. It is not possible to include both 
latent variables using the procedure of Silva et et al.: if latent variables Xi and 
X2, and their respective three indicators, are inclu ded, it turns out Y^ is not 
d-separated from I4 by neither Xi nor X2 . The best Silva et al. 2006l | can do is 
to include, say, Xi and its indicators, plus one of its descendants as an indirect 
indicator which does not violate the separations in the true model. For instance, 
the model with edges Xi — ^ Yi, for z G {1, 2, 3, 5} and no other variable, satisfies 
this condition. In contrast, the new procedure described here is able to generate 
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Figure [Ifb). 

In practice, S has to be estimated from data. In the discussion that follows, 
we assume that we know E so that we can concentrate on the theory and the 
main ideas. Section 2] provides methods to deal with an estimate of S. 



2.1 Description of output 

Our output is a measurement pattern A4p which, under the above specified 
assumptions and given the population matrix S of a set of observed variables Y, 
provides provably correct causal claims concerning the true structure A4. The 
measurement pattern is a directed mixed graph with labeled edges (as explained 
below), with hidden nodes {Li} and observed nodes that form a subset of Y. 
Aip includes directed edges from latent variables to observed variables, and 
bi-directed edges between observed variables. 

Before introducing the new procedure in Section [31 we formalize the causal 
claims that a measurement pattern A4p provides: 

1. each hidden variable Li in A4p corresponds to some hidden variable Xj 
in Q. In the items below, we call this variable X{Li); 

2. if is not a child of latent variable Lj in A^p, then Yi is independent of 
X{Lj) in Q given its parents in Alp; 

3. given any pure measurement submodel of A^p with at least three indi- 
cators per latent variable, and a total of at least four observed variables, 
then at most one of the latent-to-indicator edges Li Yj does not corre- 
spond to the true causal relationship in Q. That is, it is possible that for 
one pair, X{Li) is not a cause of Yj and/or the relationship is confounded; 

The last item needs to be clarified with an example, since it is not intuitive. 
Let Figure [3Ja) be a true causal structure from which we can measure the 
covariance matrix of Yi , . . . , Ig . The structure reported by our procedure is the 
one in Figure [3l[b). 5 out of 6 edges correspond to the correct causal statement, 
except L2 ^ Yq (which should be confounded). We cannot know which one, 
but at least we know this is th e case. As in any causal discovery algorithm 
Spirtes et all I2OOOI iPearll . |200(| . background knowledge is necessary to refine 



the information given by an equivalence class of graphical structures. 

Finally, edges are labeled as "confirmed" (they do correspond to actual paths 
in the true graph) or "unconfirmed" (we cannot decide whether a corresponding 
path exists in the true graph) . In the next section we clarify how "unconfirmed" 
edges appear. Unless otherwise stated, all other edges are "confirmed" edges. 



3 An algorithm for inferring an impure mea- 
surement model 

Let a one-factor submodel of C/ be a set composed of one hidden variable X 
and four observed variables {Ya^Yb^Yc^Yd} such that X d-separates all four 
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(a) (b) 

Figure 3: A covariance matrix obtained from the true model in (a) will result 
in the structure shown in (b). This structure should be interpreted as making 
claims about an equivalence class of models: in this case, at most one of the 
directed edges does not correspond to the exact claim that there is an uncon- 
founded causal relationship between the latent variable and its respective child. 
But notice that there are only 7 models compatible with this claim, instead of 
the 2^ = 32 possibilities (each of the six relations Li — )■ Yj being confounded or 
not) that the same adjacencies could provide. 



observations in Q. 

One-factor submodels play an important role in our procedure. A vertex 
Yi will be included in our output measurement pattern A4p if and only if it 
belongs to some one-factor submodel of Q. Also, X will correspond to some 
output latent if and only if it belongs to some one- factor submodel. Figure Ufa) 
illustrates the concept: the sets {Xi,Yi,Y2,Y3,Yr,} and {X2, 14, Ys, ^1} are 
one- factor submodels. No one- factor submodels exist for X3 and X4. 

This fact should not be surprising. It is well-known in the structural equation 
modeling literature that the foUlowing model is testable: 

Y, = X,X + (2) 

where i G {1,2,3,4}, and X and {ei} are mutually independent Gaussian vari- 
ables of zero mean. This corresponds to a Gaussian causal network with cor- 
responding edges X ^ Yi. Adding an extra edge, and hence a new parameter, 
would remove one degree of freedo m and make the m odel undistinguishable from 
models with two latent variables Silva et al. 



One way to characterize which constraints are entailed by this model is by 
writing down the tetrad constraints of this structure. If aij is the covariance of 
Yi and Yj, and cr^ is the variance of X, then for the model ([2]) the following 
identify holds: 

a'120'34 = AiA2cr|- X X:}\4ax = AiAsctI- x X2^iCFx — fi30'34 (3) 

Similarly, ai2(J^A = o'ua23- For a set of four variables {Ya,Yb,Yc,Yd}, 
we represent the statement uab'^cd — <7ac'^bd = o-adc^bc by the predicate 
T{ABCD). Notice this is entailed by the graphical structure, since the rela- 
tionship does not depend on the precise values of {A^} or aj^. 

For the causal discovery goal, however, the relevant concept is the con- 
verse: given observable constraints that can be tested, which causal structures 
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are compatible with them? Concerning one-factor submodels, the converse has 
been provecH by Silva et al. (2006): 

Fact 1 If T{ABCD) is true, then there is a latent variable in Q that d- 
separates {Ya,Yb,Yc,Yd}. 

For example, in Figure [3ja), Xi d-separates {Fi, 12, is, 5^}, although it is 
not a cause of I4. A result such as Fact 1 is important for discovering latent 
variables, but it is of limited use unless there are ways of ruling out the possibility 
that some latent variables are causes of some indicators. It turns out that the 
7"(-) constraint can also be used for this purpose. 

Consider Figure ^a) again. If we pick all three indicators of one latent 
variables along with some indicator of the other latent variables, we have a one- 
factor model that passes the conditions of Fact 1. One possibility is that all 
six indicators are pure indicators of a single latent cause: after all, each pair 
{Ya,Yb} is d-separated by some single latent variable. However, this does not 
tell us whether the latent variable that separates one group is the same as the 
one that separates another group. This is clear from Figure|3l[a): Xi d-separates 
any pair in {Yi, ^2,13} x {Y4, F5, Fg}. However, it does not d-separate any pair in 
{I4, F5, Yq} X {Y4, Ys, Yq}. We have to deduce this information without looking 
at the true graph, but only at the marginal covariance matrix of Y. 

One way of discarding connections from latents to indicators, and deducing 
that two unobserved variables are not the same, is given by the following result: 

Fact 2 Consider the observed variables {Ya, Yb,Yc, Yd, Ye,Yf}. If both 
T{ABCD) and T{ADEF) are true, but (tabctde 7^ ctadctbe, then Ya and Yd 
cannot have any common parent in Q. 

A detailed proof is given by Silva et al. (2006). The intuitive explanation is 
that, if Ya and Yd did have a common parent (say, Xad), then this latent vari- 
able would be precisely the one, and only one, responsible by both constraints 
T{ABCD) and T{ADEF) . It would not be hard to show that this would imply 
o'abO'de = ADO' BE, Contrary to the assumption. 

Notice that these two results are already enough to find a pure measure- 
ment submodel. The general skeleton of the procedure is to find a partition 
{Ml, . . . , Mc} such that U.^iM^ C Y and 

1. elements in are d-separated by some hidden variable (using Fact 1); 

2. elements in and Mj cannot have common parents (using Fact 2). 

Many more details need to be figured out in order to build an equivalence 
class of pure measurement models with three indicators per latent variable, 

•^To avoid unnecessary repetition, from now on we establish the convention that all re- 
sults use the assumptions of Section |2] without explicitly mentioning them in the theoretical 
development. 
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but this is the general idea. What is missing from this procedure is a way of 
coping with impure measurement models so that a structure such as the one in 
Figure IHJb) can be obtained. We now introduce the first theoretical results that 
accomplish that. 

3.1 Finding impure indicators 

Consider what can happen if we observe the covariance matrix generated by 
the model of Figure ^a) . We know that there is no single latent variable that 
d-separates (say) {Yi,Y2,Y3,Y4}. However, we know that there is some hidden 
L that d-separates {Yi, 1^2, ^3; ^s}: as well as some hidden L' that d-separates 
{Yi, Y4, Yr,, Fg}. So far, it could be the result of a graph such as the following: 




However, we cannot stop here and report this as a possible solution: we will 
get an inconsistent estimate for the covariance of the latent variables, which can 
lead to wrong conclusions about the causal structure of the latents. We would 
like to account for the possibility that the impurities arise not from our identified 
latents, but from some other source. This is the result summarized by Lemma 3: 

Lemma 3 Consider the observed variables {YatYb,Yc,Yo,Ye,Yp}. If the 
folllowing predicates are true: 

T{ABCE), T{ABCF), T{ADEF), T{BDEF) 

and the following predicates are false 

T{ABEF), T{ABCD), T{CDEF) 

then in the corresponding causal graph Q , we have that: 

• Q contains at least two different latent variables, Li and L2; 

• Li d-separates all pairs in {Ya,Yb,Yc} x {Yd,Ye,Yf, L2} , except Yc x 
Yd; 

• L2 d-separates all pairs in {Ya,Yb,Yc,Li} x {Yd,Ye,Yp}, except Yc x 
Yd; 

• Yc and Yd have extra hidden common causes not in {Li,L2}- 
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Figure 4: A model such as the one in (a) generates the measurement pattern 
in (b). Notice that the indication of extra hidden common causes, as repre- 
sented by, e.g., Yi O F2, does not imply that these unrepresented causes are 
independent of the represented ones. Notice also that if the edge Xi X2 
was switched in (a) to Xi X2, the corresponding pattern would still be the 
one in (b). In this case, it is clear that both edges Li — > Yi and Li Y2 
are not representing the actual causal directions of the true graph. However, 
the corresponding measurement pattern claim is about pure submodels. In this 
case, a pure submodel could be the one containing Yi,Y3,Y4 and F5 only. One 
edge, Li — Yi, still does not explicitly indicate the confounding given by Xi, 
but this is compatible with the measurement pattern description. 



A formal proof of a slightly more general result is given bv lSilval |2006l |. The 



core argument is as follows. The existence of Li and L2 follows from Fact 1 and 
the constraints T{ABCE) and T{ADEF). That Li ^ L2 follows from Fact 
2 and the fact that T{ABEF) is false. The other d-separations follow from 
Fact 1 and the given tetrad constraints. Finally, if Yc and Yd did not have any 
other hidden common cause, we could not have both T{ABCD) and T{CDEF) 
falsified at the same time, contrary to our hypothesis. 

Notice that we never claim that the implicit latent variables represented by 
bi-directed edges are independent of the discovered latent variables. Figure H] 
illustrates a case. 

The second type of impurity we will account for nodes that have more than 
one represented latent parent. 

Lemma 4 Consider the observed variables {Ya,Yb,Yc,Y]j,YetYp,Yq}. If 
the foUlowing predicates are true: 

T{ABCK), for K e {D,E,F,G}; T{KEFG), for K e {A,B,C,D}; 

and the following predicates are false 

r{KiK2K^Ki), for {Ki,K2} C {K^^Ki} C {E,F,G}; 

T{ADEF), T{ABDE) 

then in the corresponding causal graph Q , we have that: 

• Q contains at least two different latent variables, Li and L2; 

• Li d-separates all pairs in {Ya,Yb,Yc,Yd}; 
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(a) 



(b) 



Figure 5: The graph in (a) can be rebuild exactly. However, without a fourth 
indicator of X2 (e.g., ly), this latent variable can not be detected and the result 
would be the graph in (b). 



Y3 



Y5 



(a) 



(b) 



Figure 6: The graph in (a) is not fully identifiable with tetrad constraints only. 
A conservative measurement pattern needs to report the structure in (b). 



• L2 d-separates all pairs in {Yd, Yg, Yp, Yg}; 

• Li d-separates all pairs in {Ya, Yb, Yc} x {Ye, Yp, Yq, L2}, but not Yjj x 

• L2 d-separates all pairs in {Y^, Y^, Yc, Li} x {Ye,Yf,Yq}, but notYo x 

Li; 

The nature of this result complements the previous one: instead of searching 
for evidence to remove edges from latents into indicators, this result provides 
identification of edges that cannot be removed. 

The argum ent again exploits Facts 1 and 2. A more detailed proof is given 
bv lSilva 2006l |. Notice the need for extra indicators in this case: this is another 
illustration of the need for one-factor models for each latent variable. Without 
Yj in the example of Figure [5ja), the result would be the measurement pattern 
of Figure EJb). 

Notice that if there are indicators that share more than one common parent 
in Q, we cannot separate them (i.e., avoid a bi-directed edge) even if their parents 
are identified in the model using tetrad constraints only. Figure |6] illustrates 
what the measurement pattern should report. Using higher-order constr aints 
than tetrad constraints might be of help in this situation [Sull ivant and Talaskal 
2008| , but this is out of the scope of the current contribution. 

To summarize: 



Fact 1 provides the evidence to include latent variables; 
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• Fact 2 provides the evidence to distinguish between different latent vari- 
ables; 

• Lemma 3 allows the removal of extra edges from latents into indicators 
and proves the necessity of some bi-directed edges; 

• Lemma 4 proves the necessity of some edges from latents into indicators, 
but does not prove the necessity of adding some bi-directed edges; 

3.2 Putting the pieces together 

So far, we have described how to identify particular pieces of information about 
the underlying causal graph. While those results allow us to identify isolated 
latent variables and to remove or confirm particular connections, we need to 
combine such pieces within a measurement pattern. Unlike the procedure of 
Silva et a l. [2006], this pattern should be able to represent several pure mea- 
surement submodels within a single graphical object and to possibly include 
more latent variables than any pure model. 

In this section, we assume that we have the population covariance matrix S. 
We start by finding groups of variables that are potential indicators of a single 
latent variable. We first build an auxiliary undirected graph Ti as follows: 

InitialPasS: this procedure returns an undirected graph Ti. 

1. let H be a fully connected undirected graph with nodes Y; 

2. for all groups of six variables {Ya,Yb,Yc, Id, Ye,Yp} that form a clique 
in 7i, if T{ABCD) and T{ADEF) are true but ctabc^de ctadctbe, 
remove the edge Ya — Yd; 

3. if for a given Ya in H there is no triplet {Yb,Yc, Yjj} such that T{ABCD) 
holds, then remove Ya from H, since there will be no one-factor model 
including Ya] 

4. return %. 

Notice that if two vertices are not adjacent in %, they cannot possibly be 
children of the same latent variable (it follows from Fact 2). This motivates us 
to look for one-factor models within cliques of % only. In Step 3, we discard 
variables not in one-factor models, since nothing informative can be claimed 
about them using our methods. 

In the next step, we obtain a set of tentative subgraphs, where each sub- 
graph contains a single latent variable and its indicators: 

SingleLatents: given T-L, this procedure returns a set S of graphs with a single 
latent variable each. 

1. initialize S as the empty set 
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2. for each clique C in H 

3. if there is no {Ya,Yb,Yc} CC and Yd G Y such that r{ABCD) 
holds, continue to next chque 

4. create a graph Qi with latent vertex L^, i = |iS| + 1, and 
with children given by C 

5. for each {Ya, Yb} C C, if there is no Yc E C and Yd G Y such that 
T{ABCD) is true, then add edge Ya O Yb to Qi. Mark this edge as 
"unconfirmed" ; 

6. add the new graph to S; 

7. return S. 

By Fact 1, every single latent variable created in this procedure exists in Q. 
The rationale for Step 5 is that Li does not d-separate Ya and Yb ■ It is possible 
to confirm many of such edges using an argument similar to Lemma 4, but we 
leave out a detailed analysis to simplify the presentatior0. 

Finally, all single graphs are unified into a coherent measurement pattern: 

FindMeasurementPattern: returns a measurement pattern Mp given S. 

1. let A^p be the union of all graphs in S, where all latents are connected 
by bi-directed edges 

2. for every pair {Si,Sj} C 5 do 

3. consider aU triplets {Ya, Yb, Yc} C S^US-j such that TiABCD) holds 

for some Yb . If such triplets are also in Si D Sj , set the children of Lj 

to be children of Li and discard Lj. Set all Li Yk to be "uncon- 
firmed" 

if Yfc is not in Si H Sj. Continue to next pair; 

4. for every pair {Yc, Yd} <Z Sif] Sj, add "unconfirmed" edge J^c "H> Yd 
to A4p. If Lemma 3 can be applied to {Yc, Yb} where {Ya, Yb} C Si 
and {Ye,Yp} C Sj, then remove edges Lj — > Yc and Li — > Yb and 
mark yc Yb as "confirmed" ; 

''An example: in Figure [5jb), all bi-directed edges can be confirmed, because each of 
{y^jYsjye} are separated from {^^1,12,^3} by Li. We can therefore isolate the failure of 
having a one-factor model composed of {Li,Yi,Y2,Y4,Y^} down to the Y4 -fl Y5 edge. 
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(a) 



(b) 



Figure 7: With the true graph being (a), we obtain two chques of variables 
{14,^5,^6,^7} and {15,^6,17,^8} in (Figure (b)), since we can discover 
that I4 and Is are indicators of different variables. However, these two cliques 
are related to the same true latent X2 and have to be merged. The side-effect 
is that we cannot confirm the edges L2 — ^ I4 and L2 — >■ Is, although we know 
both cannot possibly exist at the same time. 



5. if Yj has more than one parent, mark all directed edges Li Yj unsup- 
ported by Lemma 4 as "unconfirmed" ; 

6. return Mp. 

The justification for most steps follows directly from our previous results. 
To understand Step 3, however, we need an example. In Figure[7lja), we have a 
true model. We can separate I4 from Ig using Fact 2. The result of FirstPass 
is the graph H shown in Figure [Tljb). Sets {^4, 15, le, >?} and {^5, F6, ^"7, ^s} 
are cliques in TL, but they refer to the same latent variable X2. There will be 
edges L2 — > I4 and L2 — s- Is in the measurement pattern, but they will not be 
confirmed edges. Notice that there might be ways of removing L2 — )■ I4 and 
L2 — > Is, but they are out of the scope of our paper. Our goal is not to provide 
complete identification methods, but to show the main tools and the difficulties 
of learning impure measurement models. 



4 Experiments 

In this section, we illustrate how the theory can be applied by analyzing two 
simple datasets. 

In practice, we will not know S, but only an estimate obtained from a sample. 
Robust statistical procedures to s core models and t est constraints from finite 
samples are described at length bv lSilva et al. 



In the following experiments, we assume data are multivariate Gaussian. 
Wishart's tetrad test can be used to evaluate T(-), which we a c cept a s true if 
the p-value for the test is greater or equal to 0.05 |Silva et aL . l2006l |. In the 



SingleLatents procedure, for each clique C we add extra bi-directed edges to 
Gi by a greedy search procedure: we loo k at each pair o f variables and evaluate 
the Bayesian information criterion fBIC. ISchwarz (l978| ) for the model with the 



added edge. If the best model is better than the current one, we keep the edge. 
Otherwise, we stop modifying Qi. An analogous procedure is performed to add 
bi-directed edges in FindMeasurementPattern. 
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(a) (b) 

Figure 8: In (a), the measurement pattern that corresponds to the gold stan- 
dard. In (b), the resuh of our procedure. Ah directed edges are correct. With 
small sample sizes, the BIG score tends to produce models simpler than ex- 
pected, so it is not surprising that the model lacks several of the bi-directed 
edges. 



In the worst-case scenario, the procedure scales at an exponential in the 
number of variables due to the necessity of finding cliques in a graph (the SiN- 
GLeLatents procedure). The examples are small and sparse enough so that 
this is not a pro blem. Some heuristics for larger problems are described by 



Silva et al. 200 



4.1 Democratization and industrialization example 

This i s the study described at the beginning of Section[T]and discussed bv lBoUeii 
1989^ . A sample of 75 countries was collected. We will discuss the outcome of 
our procedure and how it relates to the gold standard of Figure [TJ 

If the true model is indeed Figure [1] and if we had access to an oracle that 
could answer exactly which tetrad constraints hold and do not hold in the true 
model, then the result of our algorithm would be Figure |5{a). The result ob- 
tained with our implementation is shown in FigurelSJb). With only 75 samples, 
it is not surprising that the BIG score tends to produce models with fewer 
edges than expected. Still, the model reveals a lot of information present in 
the expected pattern. It also suggests ways of extending the procedure, such 
as allowing for the background knowledge that some variables have the same 
definition, but recorded over time. Recall that the resulting model was obtained 
without any extra information. 



4.2 Depression example 

The next dataset is a depression study with five indicators of self-steem (SELF), 
four indicators of depression [DEPRES] and three indicators of impulsiveness 
{IMPULS) . This dataset is one of the examples that accompany the LISREL 
software for structural equation modeling. The depression data and the meaning 
of the corresponding variables can also be found at 

• jhttpT/ /www ■ ssicentral . coiii/lisrel/examplel-2 . html 

A theoretical gold standard is show in Figure [9lja). It is worth mentioning 
that, treated as a Gaussian model, this graphical structure does not fit the data: 
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IMPULSl IMPULS2 1MPULS3 IMPULS3 



(a) (b) 

Figure 9: In (a), the gold standard of the depression study. The measurement 
pattern is precisely the same (except for the latent connections). In (b), the 
result of our procedure. The inferred model cannot contain the impulsiveness 
latent variables, as it turns out the correlation of IMPULSl and IMPULS2 
with other variables are statistically too close to zero at a 0.10 level. 



the chi-square score is 122.8 with 51 degrees of freedom. The sample size is 204. 

Our result is shown in Figure [5l^b). It was impossible to find a hidden com- 
mon cause for the indicators of impulsiveness: the correlations of IMPULSl 
and IMPULS2 with the other items were just too low, and those items had 
to be discarded. The only major difference against the gold standard was as- 
signing SELF5 with the incorrect latent parent (the role of IMPULS3 in the 
solution is compatible with the properties of a measurement pattern) . Given the 
number of bi-directed connections into SELF5, however, this indicator seems 
particularly problematic. 

It is relevant to stress that in this study, the indicators are ordinal (in a 
to 4 scale), not continuous. We were still able to provide relevant information 
despite using a Gaussian model. In future work, methods to deal with ordinal 
data will be developed. The theory for ordinal data is essentially identical, as 



discussed bv lSilva et al.l 2006j. However, non-Gaussian models need to be used, 



which increases the computational cost of the procedure considerably. 



5 Conclusion 

Learning measurement models is an important causal inference task in many 
applied sciences. Exploratory factor analysis is a popular tool to accomplish 
this task, but it can be unrelia ble and causal assumptions are often left unclear. 
Better approaches are needed. iLoehlin 2004 1 argues that while there are several 



approaches to automatically learn causal structure, none can be seem as com- 
pe titors of explorato ry factor analysis. Procedures such as the one introduced 



pe titors 01 explorato ry lactor analysis, rrocedures sucn as tne one mtroauce 
bv lSilva et al.l |200fij and extended here are important steps that fill this gap. 

The inclusion of impure indicators is an important step to make such ap- 
proaches more generally applicable. As hinted in our discussion, other identi- 
fication results to confirm or remove edges can be further developed. Higher- 
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order constraint s in the covariance matrix, b esides tetrad constraints, are yet 
to be exploited [SuUivant and Talaskai l2008l | . Exploring the higher-order mo- 
ments of the observed distribution (i.e., not only the covariance matrix) has 
been a successful ap proach to identify the causal structure of linear models 



Shimizu et al.l . l2006l | , but how to adapt them to discover a measurement model 



is still unclear. Finally, some p rogress on allowing for non-linearities has been 
made Silva and Scheinei . 2005 1. but more robust statistical procedures and fur- 
ther identification results are necessary. 
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